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ABSTRACT 

We are looking for a mathematical model of monophonic sounds 
with independent time and phase dimensions. With such a model 
we can resynthesise a sound with arbitrarily modulated frequency 
and progress of the timbre. We propose such a model and show 
that it exactly fulfils some natural properties, like a kind of time- 
invariance, robustness against non-harmonic frequencies, envelope 
preservation, and inclusion of plain resampling as a special case. 
The resulting algorithm is efficient and allows to process data in 
a streaming manner with phase and shape modulation at sample 
rate, what we demonstrate with an implementation in the func- 
tional language Haskell. It allows a wide range of applications, 
namely pitch shifting and time scaling, creative FM synthesis ef- 
fects, compression of monophonic sounds, generating loops for 
sampled sounds, synthesise sounds similar to wavetable synthesis, 
or making ultrasound audible. 

1. INTRODUCTION 

An example of our problem is illustrated in Figure 1. Given is 
a signal of a monophonic sound of a known constant pitch. We 
want to alter its pitch and the progression of its waveshape inde- 
pendently, possibly time-dependent, possibly rapidly. The sound 
must not contain noise portions such as speech does. We also do 
not try to preserve formants, that is, like in resampling, we accept 
that the spectrum of harmonics is stretched by the same factor as 
the base frequency. E.g. a square waveform shall remain square 
and so on. For some natural instruments this is appropriate (e.g. 
guitar, piano) whereas for other natural sounds this is inappropriate 
(e.g. speech). 

The organisation of this article is inspired by [ ]. With the 
paper we like to contribute the following: 

• In Section 2.1 we specify our problem. In Section 2.2 
we propose a mathematical model for monophonic sounds 
given as real functions. This model untangles phase and 
time and allows us to describe frequency modulation and 
waveshape control. In Section 2.3 we show how we utilise 
this model for phase and time modification and we formu- 
late natural properties of this process. 

• Section 3 is dedicated to theoretical details. To this end 
we introduce some notations and definitions in Section 3.1 
and Section 3.2. We investigate the properties from Sec- 
tion 2.3 like time-invariance (Section 3.3.1), linearity (Sec- 
tion 3.3.2), preservation of static waves of the unit fre- 
quency (Section 3.3.3), preservation of pure sine waves 
and robustness against non-harmonic frequencies (Sec- 
tion 3.3.4), envelope preservation (Section 3.3.6), inclusion 
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Figure 1 : A typical use case of our method: From the above signal 
of a single tone we want to compute the signal below. That is, we 
want to alter the pitch while maintaining the progression of its 
waveshape and without knowing, how the signal was generated. 



of simple resampling and time warping as a special case 
(Section 3.3.7), and we prove that our model satisfies these 
properties exactly. That is, our method is altogether theo- 
retically sound. (I could not resist that pun!) As bonus 
we verified some of the statements using the proof assistant 
PVS in Section A. 

• The problems of handling discrete signals are treated in 
Section 4, including notes on the implementation in the 
purely functional programming language Haskell. 

• We suggest a range of applications of our method in Sec- 
tion 5. 

• In Section 6 you find a survey of related work and in Sec- 
tion 7 we compare some results of our method with the ones 
produced by the similar wavetable synthesis. 

• We finish our paper in Section 8 with a list of issues that we 
still need to work on. 

2. CONTINUOUS SIGNALS: OVERVIEW 
2.1. Problem 

If we want to transpose a monophonic sound, we could just play 
it faster for higher pitch or slower for lower pitch. This is how 
resampling works. But this way the sound becomes also shorter 
or longer. For some instruments like guitars this is natural, but 
for other sounds like that of a brass, it is not necessarily so. The 
problem we face is, that with ongoing time both the waveform and 
the phase within the waveform change. Thus we can hardly say, 
what the waveshape at a precise time point is. 
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Figure 2: The cylinder we map the input signal onto (black and 
dashed helix) and where we sample the output signal from (grey). 

If we could untangle phase and shape this would open a wide 
range of applications. We could independently control progress of 
phase (i.e. frequency) and progress of the waveshape. 

2.2. Model 

The wish for untangled phase and shape leads us straight forward 
to the model we want to propose here. If phase and shape shall 
be independent variables of a signal, then our signal is actually a 
two-dimensional function, mapping from phase and shape to the 
(particle) displacement. Since the phase tp is a cyclic quantity, the 
domain of the signal function is actually a cylinder. For simplicity 
we will identify the time point t in a signal with the shape param- 
eter. That is, in our model the time points to the instantaneous 
shape. 

However, we never get signals in terms of a function on 
a cylinder. So, how is this model related to real-word one- 
dimensional audio signals? According to Figure 2 the easy direc- 
tion is to get from the cylinder to the plain audio signal: We move 
along the cylinder while increasing both the phase and shape pa- 
rameter proportionally to the time in the audio signal. This yields 
a helical path. The phase to time ratio is the frequency, the shape 
to time ratio is the speed of shape progression. The higher the ra- 
tio of frequency to shape progression, the more dense the helix. 
For constant ratio the frequency is proportional to the speed with 
which we go along the helix. We can change phase and shape 
non-proportionally to the time, yielding non-helical paths. 

When going from the one-dimensional signal to the two- 
dimensional signal, there is a lot of freedom of interpretation. We 
will use this freedom to make the theory as simple as possible. E.g. 
we will assume, that the one-dimensional input signal is an obser- 
vation of the cylindrical function at a helical path. Since we have 
no data for the function values beside the helix, we have to guess 
them, in other words, we will interpolate. 

This is actually a nice model that allows us to perform many 
operations in an intuitive way and thus it might be of interest be- 
yond pitch shifting and time scaling. 

2.3. Interpolation principle 

An application of our model will firstly cover the cylinder with 
data that is interpolated from a one-dimensional signal x by an 
operator F and secondly it will choose some data along a curve 
around that cylinder by an operator S. The operator that we will 
work with here has the structure 

Fx(t, tp) — x(tp + k) ■ n(t — tp — k) 

fcez 

where k is an interpolation kernel such as a hat function or a sinus 
cardinalis (sine). Intuitively spoken, it lays the signal on a helix on 
the cylinder. Then on each line parallel to the time axis there are 



equidistant discrete data points. Now, F interpolates them along 
the time direction using the interpolation kernel Hi. You may check 
that Fx(t, tp) has period 1 with respect to tp. This is our way to 
represent the radian coordinate of the cylinder within this section. 

The observation operator S shall sample along a helix with 
time progression v and angular speed a: 

Sy(t) = y{vt,a-t) . 

Interpolation and observation together, yield 

Mx(t) = S(Fx)(t) 

= x{a • t + k) ■ k((v — a) ■ t — k) 

feSZ 

This operator turns out to have some useful properties: 

1 . Time-invariance 

In audio signals often the absolute time is not important, 
but the time differences. Where you start an audio record- 
ing should not have substantial effects on an operation you 
apply to it. This is equivalent to the statement, that a delay 
of the signal shall be mapped to a delayed result signal. 
In particular it would be nice to have the property, that a 
delay of the input by v ■ t yields a delay by t of the out- 
put. However this will not work. To this end consider pure 
time-stretching (a = 1) applied to grains, and we become 
aware that this property implies plain resampling, which 
clearly changes the pitch. What we have at least, is a re- 
stricted time invariance: You have a discrete set of pairs of 
delays of input and output signal that are mapped to each 
other wherever the helices in Figure 2 cross, that is wher- 
ever (v — a) ■ t £ Z. 

However the construction F of our model is time invariant 
in the sense 

xi(t) = xo(t — r) 
=^Fxi(t,ip) = Fx (t-T,<p-r) . (1) 

2. Linearity 

Since both F and S are linear, our phase and time modifi- 
cation process is linear as well. This means that physical 
units and overall magnitudes of signal values are irrelevant 
(homogeneity) and mixing before interpolation is equiva- 
lent to mixing after interpolation (additivity). 

Homogeneity M(A ■ x) = \-Mx (2) 
Additivity M (x + z) = Mx + Mz (3) 

3. Resampling as special case 

We think, that pitch shifting and time scaling by factor 1 
should leave the input signal unchanged. We also think, 
that resampling is the most natural answer to pitch shifting 
and time scaling by the same factor a = v. For interpolat- 
ing kernels, that is k(0) = 1, Vj £ Z \ {0} : = 0, 
this actually holds. 

Mx(t) = x(v ■ t) 

4. Mapping of sine waves 

Our phase and time manipulation method maps sine waves 
to sine waves if the kernel is the sinus cardinalis nor- 
malised to integral zeros. 

Jl : t = 

K[t) ~ i : otherwise 



2 



arXiv: Untangling Phase and Time in Monophonic Sounds 




Figure 3: The first graph presents the lower part of the absolute 
spectrum of a piano sound. Its pitch is shifted 2 octaves down 
(factor 4) in the second graph. 



Choosing this kernel means WHITTAKER interpolation. 
Now we consider a complex wave of frequency a as in- 
put for the phase and time modification. 



x(t) 
a 
n 
b 

Mx(t) 



exp(27ri • a ■ i) 

b + n (4) 

Z 

f- 1 M 

V 2> 2> 

exp(27ri ■ (b ■ v + n ■ a) ■ t) (5) 



Note that for fraca = |, the WHITTAKER interpolation 
will diverge. If b — 0, that is the input frequency a is 
integral, then the time progression has no influence on the 
frequency mapping, i.e. the input frequency a is mapped 
to a ■ a. We should try to fit the input signal as good as 
possible to base frequency 1 by stretching or shrinking, 
since then all harmonics have integral frequency. 
The fact, that sine waves are mapped to sine waves, im- 
plies, that the effect of M to a more complex tone can be 
described entirely in frequency domain. An example of a 
pure pitch shift is depicted in Figure 3. The peaks corre- 
spond to the harmonics of the sound. We see that the peaks 
are only shifted. That is, the shape and width of each peak 
is maintained, meaning that the envelope of each harmonic 
is the same after pitch shifting. 
Preservation of envelope 

Consider a static wave x, i.e. Vt x(t) = x(t + 1), that is 
amplified according to an envelope /. If interpolation with 
k is able to reconstruct / and all of its translates from their 
respective integral values, then on the cylinder wave and 
envelope become separated 

Fx(t, <p) = /(*) ■ x(<p) 

and the overall phase and time manipulation algorithm 
modifies frequency and time separately: 

Mx{t) = f(y ■ t) ■ x(a ■ t) . 

Examples for n and / are: 

• k being the sinus cardinalis as defined in item 4 and 
/ being a signal bandlimited to (—5, §), 

• k = X(-i,o] an d / being constant, 



• n(t) = max(0, 1 — \t\) and / being a linear function, 

• k being an interpolation kernel, that preserves poly- 
nomial functions up to degree n and / being such a 
polynomial function. 



3. CONTINUOUS SIGNALS: THEORY 

In this section we want to give proofs of the statements found in 
Section 2 and we want to check what we could have done alter- 
natively given the properties that we found to be useful. You can 
safely skip the entire section if you are only interested in practical 
results and applications. 

3.1. Notation 

In order to give precise, concise, even intuitive proofs, we want to 
introduce some notations. 

In signal processing literature we find often a term like x(t) 
being called a signal, although from the context you derive, that 
actually x is the signal and thus x(t) denotes a displacement value 
of that signal at time t. We like to be more strict in our paper. We 
like to talk about signals as objects without always going down to 
the level of single signal values. Our notation should reflect this 
and should clearly differentiate between signals and signal values. 
This way, we can e.g. express a statement like "delay and convo- 
lution commute" by 

(a; * y) — » t = x * (y — > t) 

(cf. (22)) which would be more difficult in a pointwise and cor- 
rect (!) notation. 

This notation is inspired by functional programming, where 
functions that process functions are called higher-order functions. 
It allows us to translate the theory described here almost literally to 
functional programs and theorem prover modules. Actually some 
of the theorems stated in this paper have been verified using PVS 
[ ]. For a more detailed discussion of the notation, see [3]. 

In our notation function application has always higher prece- 
dence than infix operators. Thus Qx — * t means (Qx) — * t 
and not Q(x — * t). Function application is left associative, that 
is, Qx(t) means (Qx)(t) and not Q(x(t)). This is also the con- 
vention in Functional Analysis. We use anonymous functions, also 
known as lambda expressions. The expression x i— > Y denotes a 
function / where Va; f(x) = Y and Y is an expression that usu- 
ally contains x. Arithmetic infix operators like "+" and "■" shall 
have higher precedence than the mapping arrow, and logical infix 
operators like "=" and "A" shall have lower precedence. That is, 
t i-> f {t - t) = f -> r means (t h-> (f(t ~ r) + g(t - r))) = 

1 Definition (Function set). With 

B 

we like to denote the set of all functions mapping from set A 
to set B. This operation is treated right associative, that is, 
A -> B -> C means A -> (B -> C), not (A -» B) -> C. 
This convention matches the convention of left associative func- 
tion application. 
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3.2. Basic functions 

For the description of the cylinder we first need the notion of a 
cyclic quantity. 

2 Definition (Cyclic quantity). Intuitively spoken, cyclic (or peri- 
odic) quantities are values in the range [0, 1) that wrap around at 
the boundaries. More precisely, a cyclic quantity <p is a set of real 
numbers that all have the same fractional part. Put differently, a 
periodic quantity is an equivalence class with respect to the rela- 
tion, that two numbers are considered equivalent when their differ- 
ence is integral. In terms of a quotient space this can concisely be 
written as 

ip G E /z . 

3 Definition (Periodisation). Periodisation c means mapping a 
real value to a cyclic quantity, i.e. choosing the equivalence class 
belonging to a representative. 

c G R — > K /z 
Vp£ I c(p) = p + Z 

= {?:?-pe2} 

It holds c(0) = Z. We define the inverse of c as picking a repre- 
sentative from the range [0, 1). 

c _1 G K /z^R 

v<^G R / z c_1 (</?) e ¥>n[o,i) 

In a computer program, we do not encode the elements of R /z 
by sets of numbers, but instead we store a representative between 
and 1 , including and excluding 1 . Then c is just the function, 
that computes the fractional part, i.e. c t = t - floor t. 

A function y on the cylinder is thus from (R x M /z) — » V, 
where V denotes a vector space. E.g. for V = Rwe have a mono 
signal, for F = Ixlwe obtain a stereo signal and so on. 

The conversion S from the cylinder to an audio signal is en- 
tirely determined by given phase control curve g and shape control 
curve h. It consists of picking the values from the cylinder along 
the path that corresponds to these control curves. 

S h , g G ((R x R/z) — » F) — > (R — > V) (6) 
S h , g y(t) = y(h(t),g(t)) (7) 

For the conversion F from a prototype audio signal to a cylin- 
drical model we have a lot of freedom. In section Section 2.3 we 
have seen what properties a certain F has, that we use in our im- 
plementation. We will going on to check what choices for F we 
have, given that these properties hold. For now we will just record, 
that 

F G (8^V)^((Ix a /z) -> V) . 

3.3. Properties 

3.3.1. Time-Invariance 

4 Definition (Translation, Rotation). Shifting a signal x forward 
or backward in time or rotating a waveform with respect to its 
phase shall be expressed by an intuitive arrow notation that is in- 
spired by [ , ] and was already successfully applied in [3]: 

(x^r)(t) = x(t-r) (8) 
(x^r)(t) = x(t + r) . (9) 



For a cylindrical function we have two directions, one for rotation 
and one for translation. We define analogously 

(v->(T,a))(t,vO = y(t-T,tp-a) (10) 
{y*-(T,a))(t,<p) = y(t + T,<p + a) . (11) 

The first notion of time-invariance that comes to mind, can 
be easily expressed using the arrow notation by Vt F(x — > t) = 
Fx — * (t, c(0)). However, this will not yield any useful conver- 
sion. Shifting the time always includes shifting the phase and our 
notion of time-invariance must respect that. We have already given 
an according definition in (1) that we can now write using the ar- 
row notation. 

5 Definition (Time-invariant cylinder interpolation). We call an 
interpolation operator F time-invariant whenever it satisfies 

VzVi F(x -> t) = Fx ->• (t, c(i)) . (12) 

Using this definition, we do not only force F to map transla- 
tions to translations, but we also fix the factor of the translation 
distance to 1. That is, when shifting an input signal x, the accord- 
ing model Fx is shifted along the unit helix, that turns once per 
time difference 1. 

Enforcing the time-invariance property restricts our choice of 
F considerably. 

Fx(t,<p) 

= (Fx^(t,c(t)))(0,y>-c(t)) | (11) 
= F(x*-t){0,<p-c(t)) | (12) 

We see, that actually only a ring slice of F(x <— t) at time point 
zero is required and we can substitute Ix(cp) = Fx(Q, tp). I is an 
operator from (R — > V) — > ( R /z — > V), that turns a straight signal 
into a waveform. Now we know, that time-invariant interpolations 
can only be of the form 

Fx(t,<p) = I(x <- t)(<p - c(t)) (13) 
or more concisely 

ip i-> Fx(t, ip) = I(x <-£)-> c(t) . (14) 

The last line can be read as: In order to obtain a ring slice of the 
cylindrical model at time t, we have to move the signal, such that 
time point t becomes point 0, then apply / to get a waveform on a 
ring, then rotate back that ring correspondingly. 

We may check, that any F defined this way is indeed time- 
invariant in the sense of (12). 

F{x^t)(r, if) 



= /((I- 


t)«-T)fo>-c(T)) 


(13) 


= I(x <- i 


(r-t))fc-c(T)) 




= I(x <- I 


(r -*))(¥> -c(t)-c(T-t)) 




= Fi(r- 


•t,¥>-c(t)) 


(13) 


= (Fx ->• 


(t,c(t)))(T,vO 





3.3.2. Linearity 

We like that our phase and time modification process is linear (as 
in (2) and (3)). Since sampling S from the cylinder is linear, the 
interpolation F to the cylinder must be linear as well. 

Homogeneity F(\ ■ x) — A ■ Fx 
Additivity F(x + z) = Fx + Fz 
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an interpolation that maps sine waves to sine waves. Actually, the 
WHITTAKER interpolation has this property. 



Figure 4: Constant interpolation (below) of a sine wave (above) 
that is out of sync. The interpolation picture represents the surface 
of the cylinder after cutting and flattening. A black dot means 
y(t, ip) = — 1 and a white dot represents 1. The sine wave can be 
found in the interpolation image at the right border of each of the 
skew stripes. Along the vertical line from bottom to top you find 
the first period of the input signal, where "first " is measured from 
time point 0. 

The properties of F are equivalent to 

I(\ ■ x) = A • Ix 

I(x + z) = Ix + Iz . 

3.3.3. Static wave preservation 

Another natural property is, that an input signal consisting of a 
wave of constant shape is mapped to the cylinder where each 
ring contains that waveform. A static waveform can be writ- 
ten concisely as w o c. It denotes the function composition of 
w and c, that is, w is applied to the result of c, for example 
(w o c)(2.3) = iu(c(0.3)). Thus w and ibo c both represent 
periodic functions, but w has domain */z and thus is periodic by 
its type, whereas wo c is an ordinary real function, that happens 
to satisfy the periodicity property (aio c) = (mo c) ^ 1. We can 
write our requirement as 

MtMp F(w o c)(t, ip) = w(ip) . 

As an example we have a constant interpolation 

I(x) = x o c _1 
Fx(t,p) = x (t + c -1 (v? - c(t))) . 

We illustrate the constant interpolation in Figure 4, but with a sine 
wave, that does not have frequency 1 , and thus looks for the inter- 
polation operator F like a non-static waveform. This way, we can 
better demonstrate how constant interpolation works, and we think 
one can verify intuitively, how it preserves static waves. 

We can consider an input signal of the form wo c as a wave 
with constant envelope and we will generalise this to other en- 
velopes in Section 3.3.6. 

3.3.4. Mapping of pure sine waves 

We like to derive, how frequencies are mapped when converting 
from an audio signal to the cylindrical model and observing the 
signal along a different but uniform helix. To this end, we need 



sincl t = 



lim 

T — >t 



sin(Y ■ 7r) 
r ■ ty 
1 : t = 

otherwise 



q(t-7r) 



Fx(t,<p) = X ( T ) ' sincl (t — r) 



(15) 



Since p £ R /z, when r £ ip then r assumes all values that differ 
from c _1 (iy9) by an integer. The infinite sum ~}2 Teip f(r) shall be 

understood as lim,,^ E T ^ n [-„,n] f( T )- 

The proof of F being time-invariant according to Definition 5 
is deferred to Section 3.3.5, where we perform the proof for any 
interpolating kernel, not just sincl. 

We will now demonstrate, that sincl-interpolation preserves 
sine waves and how frequencies are mapped. 

Mapping a complex sine wave to the cylinder Since exponen- 
tial laws are much easier to cope with than addition theorems for 
sine and cosine, we use a complex wave defined by 

cisl t = exp(27ri • t) 

For the following derivation we need the WHITTAKER-SHANNON 
interpolation formula [ ] in the form 



Vbe (- 



cisl(& • k) ■ sincl(£ - k) = cisl(6 • t). (16) 



We choose a complex wave of frequency a as input for the con- 
version to the cylinder. The fractional frequency part b and the 
integral frequency n are chosen as in (4). 

x(t) = cisl(a • t) 
with a = b + n 

n e z 

b 6 (-!,!) 

This choice implies the following interpolation result 

Fx(t,ip) = cisl(q ■ r) ■ sincl(t — r) 

Vr G ip 

Fx(t, ip) = cisl(a • r) • cisl(a • k) ■ sincl(£ — r — k) 

because a — b G Z 

= cisl(a. ■ r) ■ cisl(6 ■ k) ■ sincl(t — 



kf-_Z 



= cisl(a. ■ r) ■ cisl(fe • (t — r)) 
= cisl(b ■ t + n ■ t) 
Fx(t,ip) = cisl (fe • t + n ■ c _1 (ip)) 



k) 
(16) 

(17) 



The result can be viewed in Figure 5. We obtain, that for every t 
the function on a ring slice p i— > Fx(t, <p) is a sine wave with the 
integral frequency n that is closest to a. That is, the closer a is to 
an integer, the more harmonics of a non-sine wave are mapped to 
corresponding harmonics in a ring slice of Fx. 
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Figure 5: The sine wave as in Figure 4 is interpolated by WHIT- 
TAKER interpolation. Along the diagonal lines you find the origi- 
nal sine wave. 

Mapping a complex wave from the cylinder to an audio signal 

For time progression speed v and frequency a we get 

z(t) = Fx(v ■ t,c(a- t)) 

= cisl(6 ■ v ■ t + n ■ c -1 (c(a ■ t))) 
because VreR c~ x (c(r)) - r G Z 

= cisl(& ■ v ■ t + n ■ a ■ t) 
= cisl((6 • v + n • a) • i) 

This proves (5). 

3.3.5. Interpolation using kernels 

Actually, for the two-dimensional interpolation F we can use any 
interpolation kernel k, not only sincl as in (15). 

Fx(t,ip) = ]P x(t) -Kit- t) (18) 

The constant interpolation corresponds to k = X(-i,o]- Linear 
interpolation is achieved using a hat function. 

6 Lemma (Time invariance of kernel interpolation). The operator 
F defined with an interpolation kernel as in (18) is time-invariant 
according to Definition 5. 

Proof. 

F{x^d)(t,<p) = d)(T)-«(t-T) 

= X(T - d) ■ K((t - d) - {T - dj) 

— X ( T ) ' K (t — d — t) 

rS(¥>-c(d)) 

= (Fx^ (d,c(d)))(t,<p) 

□ 

Conversely, we like to note, that kernel interpolation is not the 
most general form when we only require time-invariance, linearity 
and static wave preservation. 

The following considerations are simplified by rewriting gen- 
eral kernel interpolation to a more functional style using a discreti- 
sation operator and a mixed discrete/continuous convolution. 

7 Definition (Quantisation). With quantisation we mean the op- 
eration that picks the signal values at integral time points from a 
continuous signal. 

Q e (R -> V) -> (Z -> V) 

VnGZ Qx(n) = x(n) (19) 



Here is, how quantisation operates on pointwise multiplied 
signals and on periodic signals: 

Q{x-z) = Qx-Qz (20) 
VnGZ Q(woc)(n) = tu(c(0)) . (21) 

8 Definition (Mixed Convolution). For u G 1 — > V and x G 
E — > R then mixed discrete/continuous convolution is defined by 

(u*x)(t) = ^ tt(fc) • x(t - k) 

fcgZ 

We can express mixed convolution also by purely discrete con- 
volutions: 

Q((u * x) <— t) — u * Q(x <— t) 

It holds 

(u * x) — > t — u * (a; — > t), (22) 

because translation can be written as convolution with a translated 
DlRAC impulse and convolution is associative in this case (and 
generally when infinity does not cause problems). Thus we will 
omit the parentheses. We like to note, that this example demon- 
strates the usefulness of the functional notation, since without it 
even a simple statement like (22) is hard to formulate in a correct 
and unambiguous way. 

These notions allow us to rewrite kernel interpolation (18): 

Vr G ifi Fx(t,<p) = ^ x(k + t) • n(t - (fc + t)) 

fcez 

Vr G <p t h-» Fx(t, tp) = Q(x < — t ) # ft — ► t . (23) 

The last line can be read as follows: The signal on the cylinder 
along a line parallel to the time axis can be obtained by taking 
discrete points of x and interpolate them using the kernel k. 

3.3.6. Envelope preservation 

We can now generalise the preservation of static waves from Sec- 
tion 3.3.3 to envelopes different from a constant function. 

9 Lemma. Given an envelope / from R — > R and an interpolation 
kernel k that preserves any translated version of /, i.e. 

Vt 0(/ <-«)*« = /«-*, (24) 

then and only then, a wave of constant shape w enveloped by / is 
converted to constant waveshapes on the cylinder rings enveloped 
by / in time direction: 

F(f -(wo c))(t,<p) = /(*)•«>(» . (25) 

Proof. 

\freip F(f ■ (wo c))(t,(p) 

— Q((f - (wo c)) < — t) * n — > t 

= Q((f *- T ) ■ [(w «- ¥>) ° c )) * K -> t 

= w(<p) • Q(f r) * K -> r | (20,21,9) 

Now the implication (24) =>• (25) should be obvious, whereas the 
converse (25) =S> (24) can be verified by setting Vip to(y>) = 1. 
This special case means that the envelope / used as input signal is 
preserved in the sense 

Ff(t,<p) = /(*) . 

□ 
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10 Corollary. When we convert back to a one-dimensional audio 
signal under the condition (24), then the time control only affects 
the envelope and the phase control only affects the pitch: 

S h , g (F{f- (wo c))) = (/ oh)-(wog) . 

3. 3. 7. Special cases 

As stated in item 3 of Section 2.3 we like to have resampling as 
special case of our phase and time manipulation algorithm. It turns 
out, that this property is equivalent to putting the input signal x on 
the diagonal lines as in Figure 4 and Figure 5. We will derive, what 
this imposes on the choice of the kernel k when F is denned via a 
kernel as in (23). 

11 Lemma. For F defined by 

Vr G ip 1 \— ► Fx(t, if) — Q(x <— t) * k — > r 

it holds 

VzVteR x(t) = Fx(t,c(t)) 

if and only if 

Qk — 5, 

that is, k is a so called interpolating kernel. 

Here, 5 is the discrete DlRAC impulse, that is 



(26) 



Proof. "=J 

VzVt G 



Vfc g Z S(k) = 



x(t) 



k = 
otherwise 



Fx(t,c(f)) 

= (Q(x <- t) * K -> t)(t) 
consider only f £ Z and rename it to k 
Vx Vfc G Z x(k) = (Q(x <-k)*K^> k)(k) 
= (Qx * k)(k) 
Va; Qx = Q(Qx * k) 

= Qx * Qk (discrete convolution) 
For Qx = 8 we get 5 = 5* Qk = Qk. 



Conversely, every interpolating kernel k asserts (26): 

VzVtGR (Q(x <- i) * k -> t)(i) 

= (Q(z<-t)*re)(0) 

= Q(Q(ar<-t)*K)(0) 

= (Q(arf-t)*Q«;)(0) 

= (Q(a:f-t)*<5)(0) 

= (*«-t)(0) 

= a;(t) . 



(8) 

(19) 

(22) 



□ 



Now, when our conversion from the cylinder to the one- 
dimensional signal does only walk along the unit helix, we get 
general time warping as special case of our method: 

S h ,ooh(Fx) = t~Fx(h(t), c{h(t))) | (7) 
= t h-> x(h(t)) | (26) 

= x o h 

For h = id we get the identity mapping, for hit) = v ■ t we get 
resampling by speed factor v. 




Figure 6: Mapping of the sampled values to the cylinder in our 
method. The variables s and I are coordinates in the skew coordi- 
nate system. 



4. DISCRETE SIGNALS 

For the application of our method to sampled signals we could 
interpolate a discrete signal it containing a wave with period T, 
thus getting a continuous signal x with as(S) = u(n) and proceed 
with the technique for continuous signals from Section 2. How- 
ever, when working out the interpolation this yields a skew grid 
with two alternating cell heights and a doubled number of paral- 
lelogram cells, which seems to be unnatural to us. Additionally it 
would require three distinct interpolations, e.g. two distinct inter- 
polations in the unit helix direction and one interpolation in time 
direction. Instead we want to propose a periodic scheme where 
we need two interpolations with the same parameters in unit helix 
("step") direction and one interpolation in the skew "leap" direc- 
tion. This interpolation scheme is also time-invariant in the sense 
of item 1 in Section 2.3 and Definition 5 when we restrict the trans- 
lation distances to multiples of the sampling period. 

The proposed scheme is shown in Figure 6. We have a skew 
coordinate system with steps s and leaps /, We see, that this 
scheme can cope with non-integral wave periods, that is, T can 
be a fraction (in Figure 6 we have T = ^A). Whenever the wave 
period is integral, the leap direction coincides with the time direc- 
tion. The grid nicely matches the periodic nature of the phase. The 
cyclic phase yields ambiguities, e.g. a leap could also go to where 
I' is placed, since this denotes the same signal value. We will later 
see, that this ambiguity is only temporary and will vanish at the end 
(29). Thus we use the unique representative c -1 (y>) of ip. To get 
(I, s) from (t, c _1 (tp)) we have to convert the coordinate systems, 
i.e. we have to solve the simultaneous linear equations 



1 I roundT l\ (l 
T ' ground T-T l) ' \s 



where round is any rounding function we like. E.g. in Figure 6 it 
is round T = 4, Its solution is 



I — t — c ((fi) 

s = t-T-l- round T 



(27) 



Using the interpolated input x we may interpolate y linearly 



r = L^J ' round T + s 

lerp(£,»?)(A) = £ + A •(*?-£) 

frac A = A — |_ AJ 

y{t,<p) = lerp (x($),x( r+ ™" dT )) (fract) 



(28) 
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or more detailed 

n = round T + [s\ 

a = lerp(«(n), u(n + l))(fracs) 
b = lerp(u(n + roundT), u(n + roundT + 1)) 
(frac s) 

y(t,tp) = lerp(a, 6)(frac /) 

Actually, we do not even need to compute s since by expansion 
of s the formula for r can be simplified and it is frac s = frac r. 
From I we actually only need frac This proves, that every repre- 
sentative of ip could be used in (27). 

r = t-T- frac/ -roundT (29) 

n — [_r\ 

a — lerp(u(n), u(n + 1 ))(frac r) 

b — lerp(u(n + round T), u(n + roundT + f))(frac r). 

4.1. General Interpolations 

Other interpolations than the linear one use the same computations 
to get frac I and r, but they access more values in the environment 
of n, i.e. u(n + j + k ■ round T) for some j and k. E.g. for 
linear interpolation in the step direction and cubic interpolation in 
the leap direction, it is j G {0, f }, k € {-1, 0, f , 2}. 

4.2. Coping with Boundaries 

So far we have considered only signals that are infinite in both time 
directions. When switching to signals with finite time domain we 
become aware that our method consumes more data than it pro- 
duces at the boundaries. This is however true for all interpolation 
methods. 

We start considering linear interpolation: In order to have a 
value for any phase at a given time, a complete vertical bar must 
be covered by interpolation cells. That happens the first time at 
time point I. The same consideration is true for the end of the sig- 
nal. That is, our method always reduces the signal by two waves. 
Analogously, for k node interpolation in leap direction we lose k 
waves by pitch shifting. 

If we would use extrapolation at the boundaries, then for the 
same time but different phases we would sometimes have to inter- 
polate and sometimes we would extrapolate. In order to avoid this, 
we just alter any t 6 [0, 1) to t = 1 and limit t accordingly at the 
end of the signal. 

4.3. Efficiency 

The algorithm for interpolating a value on the cylinder is actually 
very efficient. The computation of the interpolation parameters 
and signal value indices in (29) needs constant time, and the in- 
terpolation is proportional to the number of nodes in step direc- 
tion and the number of nodes in leap direction. Thus for a given 
interpolation type, generating an audio signal from the cylinder 
model needs time proportional to the signal length and only con- 
stant memory additional to the signal storage. 

4.4. Implementation 

A reference implementation of the developed algorithm is writ- 
ten in the purely functional programming language Haskell [7]. 



The tree of modules is located at http:// dares, haskell. 
org/ synthesizer/ src/. In [ ] we have already shown, how 
this language fulfils the needs of signal processing. The absence 
of side effects makes functional programming perfect for paral- 
lelisation. Recent progress on parallelisation in Haskell [ ] and 
the now wide availability of multi-core machines in the consumer 
market justifies this choice. 

We can generate the cylindrical wave function with the func- 
tion Synthesizer . Basic . Wave . sampledTone given the 
interpolation in leap direction, the interpolation in step direc- 
tion, the wave period of the input signal and the input sig- 
nal. The result of this function can then be used as in- 
put for an oscillator that supports parametrised waveforms, like 
Synthesizer . Plain . Oscillator . shapeMod. By the 
way, this implementation again shows, how functional program- 
ming with higher order functions supports modularisation: The 
shape modulating oscillator can be used for any other kind of 
parametrised waveform, e.g. waveforms given by analytical func- 
tions. This way, we have actually rendered the tones with morph- 
ing shape in the figures of this paper. In an imperative language 
you would certainly call the waveform being implemented as call- 
back function. However due to aggressive inlining the compiled 
program does not actually need to callback the waveform function 
but the whole oscillator process is expanded to a single loop. 



4.5. Streaming 

Due to its lazy nature, Haskell allows simple implementation 
of streaming, that is, data is processed as it comes in, and thus 
processing consumes only a constant amount of memory. If we 
apply our pitch shifting and time stretching algorithm to an as- 
cending sequence of time values, streaming is possible. This ap- 
plies, since it is warranted, that ^ is not too far away from t. Since 
frac I € [0, 1) it holds 



round T 



(30) 



Thus we can safely move our focus to t-T— round T in the discrete 
input signal u, which is equivalent to a combined translation and 
turning of the wave function on the cylinder. 

What makes the implementation complicated is the handling 
of boundaries. At the beginning we limit the time parameter as 
described in Section 4.2. However at the end, we have to make 
sure that there is enough data for interpolation. It is not so simple 
to limit t to the length of input signal minus size of data needed 
for interpolation, since determining the length of the input signal 
means reading it until the end. Instead when moving the focus, we 
only move as far as there is enough data available for interpola- 
tion. The function is implemented by Synthesizer .Plain . 
Oscillator . shapeFreqModFromSampledTone. 



5. APPLICATIONS 

5.1. Combined pitch shifting and time scaling 

With a frequency control curve / and a shape control g we get 
combined pitch shifting and time scaling out of our model using 



the conversion S 



ff, g ( see ( ? )>- 
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Figure 7: Pitch shifting performed on the signal of Figure 1 us- 
ing linear interpolation in both directions. Above is the result of 
wavetable synthesis, below is the result of our method. 



5.2. Wavetable synthesis 

Our algorithm might be used as alternative to wavetable synthesis 
in sampling synthesisers [10]. For wavetable synthesis a mono- 
phonic sound is reduced to a set of waveforms, that is stored in 
the synthesiser. On replay the synthesiser plays those waveforms 
successively in small loops, maybe fading from one waveform to 
the next one. If we do not reduce the set of waveforms, but just 
chop the input signal into wave periods, then apply wavetable syn- 
thesis with fading between waveforms, we have something very 
similar to our method. In Figure 7 we compare wavetable synthe- 
sis and our algorithm using the introductory example of Figure 1 . 
In this example both the wavetable synthesis and our method per- 
form equally well. If not stated otherwise, in this and all other 
figures we use linear interpolation. This minimises artifacts from 
boundary handling and the results are good enough. 




2 4 6 8 10 12 14 16 18 20 




2 4 6 8 10 12 14 16 18 20 




2 4 6 8 10 12 14 16 18 20 



5.3. Compression 

Wavetable synthesis can be viewed as a compression scheme: 
Sounds are saved in the compressed form of a few waves in the 
wavetable synthesiser and are decompressed in realtime when 
playing the sound. Analogously we can employ our method for 
compression of monophonic sounds. For compression we simply 
shrink the time scale and for decompression we stretch it by the 
reciprocal factor. An example is given in Figure 8. 

The shrinking factor, and thus the compression factor, is lim- 
ited by non-harmonic frequencies. These are always present in 
order to generate envelopes or phasing effects. Consider the fre- 
quency a that is decomposed into b + n as in (4), no pitch shift, i.e. 
a = 1, and the shrinking factor v. According to (5), the frequency 
b+n is mapped to b-v+n. In order to be able to decompose b-v+n 
into b- v and n again on decompression, it must be b- v £ (— |, |). 
This implies, that if b is the maximum absolute deviation from an 
integral frequency, that you want to be able to reconstruct, then it 
must be v < . 

The mapping of frequencies can be best visualised using the 
frequency spectrum as in Figure 9. Note how the peaks become 
wider by the compression factor while their shape is maintained. 
The resolution is divided by the compression factor, and this is 
why the compressed data actually consumes less space. The shape 
of a peak expresses the envelope of the according harmonic and 
widening it, means a time shrunken envelope. 



Figure 8: We show how a piano sound is altered by compression 
and decompression. The top-most graph is the original sound. The 
graphs below are the results of compression and decompression 
with cubic interpolation by the associated factors in the left col- 
umn. Because the interpolation needs a margin at beginning, we 
have copied the first two periods when compressing and decom- 
pressing. 



- 
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Figure 9: The first graph presents the lower part of the absolute 
spectrum of a piano sound. This is then compressed by a factor 4 
in the second graph. 
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If we compress too much, then peaks will overlap and we get 
aliasing effects on decompression. Aliasing can be suppressed by 
smoothing across the same phase of all waves. That is, for the 
monophonic sound x with period T and a smoothing filter window 
w, we should compress x * (to "f round T) instead of x. We use 
the up arrow for the upsampling operator where 



Vftc}cZ (w]c) k = 




: k = mod c 
: k ^ mod c 



Actually, we could use the frequency spectrum not only for 
visualising the compression (or pitch-shifting), but we could also 
use the frequency spectrum itself for compression. The advantages 
would be simpler anti-aliasing (we would just throw away values 
outside bands around the harmonics) and we could also strip high 
harmonics, once they fall below a given threshold. The advantage 
of computing in the time-domain is, that it consumes only linear 
time with respect to the signal length, not linear-logarithmic time 
like the FOURIER transform, that it can be applied in a streaming 
way and allows to adapt the compression factor to local charac- 
teristics of a sound. For instance, you may use a shrinking factor 
close to 1 for fast varying portions of the signal and use a larger 
shrinking factor on slowly modulated portions. 

5.4. Loop sampled sounds 

Another way to save memory in sampling synthesisers is to loop 
sounds. This is especially important in order to get infinite sounds 
like string sounds out of a finite storage. Looping means to repeat 
portions of a sampled sound. The problem is to find positions of 
matching sound characteristics: A loop that causes a jump or an 
abrupt change of the waveform is a nasty audible artifact. Espe- 
cially in samples of natural sounds there might be no such match- 
ing positions, at all. Then the question is, whether the sample can 
be modified in a way that preserves the sound but provides fine 
loop boundaries. Several solutions using fading or time reversal 
have been proposed. 

Our method offers a new way: We may move the time forth 
and back while keeping pitch constant. In Figure 10 we show two 
reasonable time control curves. Both control curves start with ex- 
actly reproducing the sampled sound and then smoothly enter a 
cycle. Actually, we copy the first part verbatim instead of running 
time stretching with factor 1, since our method cannot reproduce 
the beginning of the sound due to interpolation margins. The cycle 
of the first control curve consists of a sine, that warrants smooth 
changes of the time line. However with this control, interferences 
are prolonged at the loop boundaries, which is clearly audible. It 
turns out that the second control curve, namely the zig-zag curve, 
sounds better. It preserves any chorus effect and the change of the 
time direction is not as bad as expected. 

A nice property of this approach is, that the loop duration is 
doubled with respect to the actually looped data. In contrast to 
that, a loop body generated by simple cross-fading of parts of the 
sound, say, with a VON HANN window, would half the loop body 
size and sounds more hectically. 

Since the time control affects only the waveform, it is war- 
ranted that at the cycle boundaries of the time control the wave- 
forms of the time manipulated sound match, too. In order to assert 
the also the phases match you have to choose a time control cycle 
length that is an integral multiple of the wave period. 




o ^ 1 1 1 1 1 1 1 1 1 1 

200 400 600 800 1000 1200 1400 1600 1800 2000 



Figure 10: Two possible time control curves for generating a 
loopable portion of a sampled sound. 
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0.002 0.004 0.006 0.008 0.01 0.012 

Figure 1 1 : Echolocation call of Nyctalus noctula. The time values 
are seconds. 



5.5. Making inaudible harmonics audible 

Remember, that our model does not preserve formants. Another 
application, where this is appropriate, is to process sounds, where 
formants are not audible anyway, namely ultrasound signals. Our 
method can be used, to make monophonic ultrasound signals au- 
dible by decreasing the pitch and while maintaining the length. In 
Figure 1 1 we show an echolocation call of a bat. It is a chirp from 
about 35 kHz to 25 kHz sampled at 441 kHz. The chirp nature 
does not match the requirements of our algorithm, so it is not easy 
to choose a base frequency. We have chosen 25 kHz and divide the 
frequency by factor 5 while maintaining the length. Unfortunately 
the waves have no special form that we can preserve. So this exam- 
ple might serve a demonstration of the robustness of our algorithm 
with respect to non-harmonic frequencies and the preservation of 
the envelope. In the same way our method might be used to in- 
crease the pitch of infrasound. 



5.6. FM synthesis 

Since we can choose the phase parameter per sample, we can not 
only do regular pitch shifting, but we can also apply FM synthe- 
sis effects [ ]. An FM effect alone could also be achieved with 
synchronised time warping, however with our method we can per- 
form pitch shifting, time scaling and FM synthesis in one go. See 
Figure 12 for an example. 
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Figure 12: Above is a sine wave that is distorted by v i— ► sgnw ■ 
\v\ p forp running from | to 4. Below we applied our pitch shifting 
algorithm in order to increase the pitch and change the waveshape 
by modulating the phase with a sine wave of the target frequency. 
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Figure 13: A tone generated from pink noise by time stretching. 
The source and the target period are equal. The time is stretched 
by factor 4. 



5.7. Tone generation by time stretching 

The inability to reproduce noise can be used for creative effects. 
By time stretching we can get a tone out of every sound. This is 
exemplified in Figure 13. If we stretch time by a factor n for a spe- 
cific period T (source and target period shall be equal), then in the 
spectrum the peak for each harmonic of frequency ^ is narrowed 
by a factor n. 

6. RELATED WORK 

The idea of separating parameters (here phase and shape) that are 
in principle indistinguishable is not new. For example it is used 
in [12] for separation of sine waves of considerably different fre- 
quencies. This way a numerically problematic ordinary differen- 
tial equation is turned into a well-behaved partial differential equa- 
tion. 

Also the specific tasks of pitch shifting and time scaling are ad- 
dressed by a broad range of algorithms [ ]. Some of them are in- 
tended for application on complex music signals and are relatively 
simple, like "Overlap and Add" (OLA), "Synchronous Overlap 
and Add" (SOLA) [14, 15], or the three-phase overlap algorithm 
using cosine windows presented in [ ]. They take segments of an 
audio signal as they are, rearrange them and reduce the artifacts of 
the new composition. Other methods are based on a model of the 
sound. E.g. "pitch-synchronous overlap-add" (PSOLA) is roughly 
based on the excitation+filter model for speech [17, 18, 19], sinu- 
soidal models interpret sounds as mixture of sine waves that are 
modulated in amplitude and frequency [20], even more sophisti- 
cated models treat sounds as mix of sine waves, transients and a 
residual [ ]. There are also methods specific to monophonic sig- 
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Figure 14: Mapping of the sampled values to the cylinder in the 
wavetable-oscillator method. The grey numbers are the time points 
in the input signal. 



nals, like wavetable synthesis [ ] and advanced methods, that can 
cope with frequency modulated input signals [22]. 

In the following two sections we like to compare our method 
with the two methods that are most similar to the one we intro- 
duced here, namely with wavetable synthesis and PSOLA. 



6.1. Comparison with Wavetable Synthesis 

When we chop our input signal into wave periods and use the 
waves as wavetable, then wavetable synthesis becomes rather sim- 
ilar to our method [ ]. Wavetable synthesis also preserves wave- 
forms, rather than formants, it allows frequency and shape modu- 
lation at sample rate. However, due to the treatment of waveforms 
as discrete objects, the wavetable synthesis cannot cope well with 
non-harmonic frequencies (Figure 16). Thus, in wavetable synthe- 
sisers, phasing is usually implemented using multiple wavetable 
oscillators. A minor deficiency is, that fractional periods of the 
input signal are not supported. The wavetables always have to 
have an integral length. We consider this deficiency to be not so 
important, since when we do not match the wave period exactly, 
this will appear to the wavetable synthesis algorithm as a shifting 
waveform. But that algorithm must handle varying waveshapes 
anyway. 

The wavetables in a wavetable synthesiser are usually created 
by a more sophisticated preprocessing than just chopping a signal 
into pieces of equal length. However, for comparison purposes we 
will just use this simple procedure. 

Chopping and subsequent wavetable synthesis can also be in- 
terpreted as placing the sample values on a cylinder and interpo- 
lating between them. It yields the pattern shown in Figure 14. The 
variable s denotes the "step" direction, which coincides with the 
direction of the phase in this scheme. The variable I denotes the 
"leap" direction, which coincides with the time direction. In order 
to fit the requirement of a wave period of 1 we shrink the discrete 
input signal. Say, the discrete input signal is it, the wave period is 
T, that must be integral, and the real input signal is x, that we de- 
fine at some discrete fractional points by x{^) = u(n) and at the 
other ones by interpolation. In Figure 14 it is T = 4 and for exam- 
ple ?/(1.7, c(0.6)) is located in the rectangle spanned by the time 
points 6, 7, 10, 11. For simplicity let us use linear interpolation as 
in (28). We would interpolate 

2/(1.7) (c(0.6)) = 

lerp(lerp(u(6),«(7))(0.4),lerp(w(10),M(ll))(0.4))(0.7). 
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In general for y(t, ip) we get 

Vr e 1 frac r — r — \r\ 

VreR x(%) = \erp(u([r\),u{[r\ + l))(fracr) 

T = Ltj+C" 1 ^) 

y(t,tp) = lerp(x(r), x(t + l))(fraci) 



or more detailed 

s — 
n — 



T.[tJ + |*J 

lerp(w(n), w(n + l))(frac s) 
lerp(u(n + T), u(n + T + l))(fracs)) 
lerp(a, 6) (frac f). 



6 = 

The handling of waveform boundaries points us to a problem of 
this method: Also at the waveform boundaries we interpolate bet- 
ween adjacent values of the input signal u. That is, we do not wrap 
around. This way, waveforms can become discontinuous by inter- 
polation. We could as well wrap around the indices at waveform 
boundaries. This would complicate the computation and raises the 
question, what values should naturally be considered neighbours. 
We remember, that we also have the ambiguity of phase values 
in our method. But there, the ambiguity vanishes in a subsequent 
step. 

6.1.1. Boundaries 

If we have an input signal of n wave periods, then we have only 
n— 1 sections where we can interpolate linearly. Letting alone that 
this approach cannot reconstruct a given signal, it loses one wave 
at the end for linear interpolation. If there is no integral number of 
waves, than we may lose up to (but excluding) two waves. For in- 
terpolation between k nodes in time direction we lose k — 1 waves. 
Of course, we could extrapolate, but this is generally problematic. 

That is, the wavetable oscillator cuts away between one and 
two waves, whereas our method always reduces the signal by two 
waves. Thus the wavetable oscillator is slightly more economic. 

6.2. Comparison with PSOLA 

Especially for speech processing, we would have to preserve for- 
mants rather than waveshapes. The standard method for this appli- 
cation is "(Time Domain) Pitch-Synchronous Overlap/ Add" (TD- 
PSOLA) [17, 18]. PSOLA decomposes a signal into wave atoms, 
that are rearranged and mixed while maintaining their time scale. 
The modulation of the timbre and the pitch can only be done at 
wave rate. As for wavetable synthesis it is also true for PSOLA, 
that due to the discrete handling of waveforms, non-harmonic fre- 
quencies are not handled well. 

Incidentally, time shrinking at constant pitch with our method 
is similar to PSOLA of a monophonic sound. For time shrinking 
with factor v and interpolating with kernel ft our algorithm com- 
putes: 

z(t) = y(v-t,c(t)) 

= y\(f + k) ■ k(v ■ t - (t + ft)) 



y~] x (t + k ) ■ «((« - 1) • t - k) 





Figure 15: Pitch shifting performed on a periodically amplitude 
modulated tone using linear interpolation. The figures show from 
top to bottom: The input signal, the signal recomputed with a dif- 
ferent pitch ( that is, the ideal result of a pitch shifter), the result of 
wavetable oscillating, the result of our method. 



= E 



(x^k)- ((K->k) i (V-1)) 



with (k I d)(t) = n(d- 1) 



We see that the interpolation kernel k acts like the segment window 
in PSOLA, but it is applied to different phases of the waves. For 
v = 1, only the non-translated x is passed to the output. 

Intuitively we can say, that PSOLA is source oriented or push- 
driven, since it dissects the input signal into segments independent 
from what kind of output is requested. Then it computes, where 
to put these segments in the output. In these terms, our method 
is target oriented or pull-driven, as it investigates for every output 
value, where it can get the data for its construction from. 

Actually, it would be easy to add another parameter to PSOLA 
for time stretching the atoms. This way one could interpolate bet- 
ween shape preservation and formant preservation. 

7. RESULTS AND COMPARISONS 

Finally we like to show some more results of our method and com- 
pare them with the wavetable synthesis. 

In Figure 15 we show, that signals with band-limited ampli- 
tude modulation can be perfectly reconstructed, except at the boun- 
daries. Although we do not employ WHITTAKER interpolation but 
simple linear interpolation the result is convincing. 

In Figure 16 we apply our method to a sine with a frequency 
that is clearly distinct from 1. To a monophonic pitch shifter this 
looks like a rapidly changing waveform. As derived for WHIT- 
TAKER interpolation in (17) our method can at least reconstruct 
the sine shape, however the frequency of the pitch shifted signal 
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Figure 16: Pitch shifting performed on a sine tone with a fre- 
quency that deviates from the required frequency 1, The graphs 
are arranged analogously to Figure 15. 

differs from the intended one. Again, the used linear interpolation 
does not seem to be substantially worse. 

We also like to show how phase modulation at sample rate 
can be used for FM synthesis combined with pitch shifting. In 
Figure 17 we use a sine wave with changing distortion as input, 
whereas in Figure 1 8 the sine wave is not distorted, but detuned to 
frequency 1.2, which must be treated as changing waveform with 
respect to frequency 1. 

As a kind of counterexample we demonstrate in Figure 19, 
how the boundary handling forces our method to limit the time 
parameter to values above 1 and thus it cannot reproduce the be- 
ginning of the sound properly. For completeness we also present 
the same sound transposed by PSOLA in Figure 20. 

Please note that the examples have a small number of periods 
(7 to 10) compared to signals of real instruments (say, 200 to 2000 
per second). On the one hand, graphs of real world sounds would 
not fit on the pages of this journal at a reasonable resolution. On 
the other hand, only for those small numbers of periods we get a 
visible difference between the methods we compare here. How- 
ever, if you are going to implement a single tone pitch shifter from 
scratch you might prefer our method, because it handles the cor- 
ner cases better and the complexity is comparable to that of the 
wavetable oscillator. Also for theoretical considerations we rec- 
ommend our method since it exposes the nice properties presented 
in Section 2. 

7.1. Conclusions 

We shall note that despite the differences between our method and 
existing ones, many of the properties discussed in Section 2.3 hold 
approximately also for the existing methods. Thus the worth of 
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Figure 17: Above is a sine wave that is distorted by v i— ► sgnu ■ 
\v\ p forp running from = to 4. Below we applied our pitch shifting 
algorithm in order to increase the pitch and change the waveshape 
by modulating the phase with a sine wave of the target frequency. 
The graphs are arranged analogously to Figure 15. 
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Figure 18: Here we demonstrate FM synthesis where the carrier 
sine wave is detuned. The graphs are arranged analogously to 
Figure 15. 
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Figure 19: Pitch shifting performed on a percussive tone. The 
graphs are arranged analogously to Figure 15. 
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Figure 20: Pitch shifting with the tone from Figure 19 that pre- 
serves formants performed by PSOLA. 



our work is certainly to contribute a model where these properties 
apply exactly. This should serve a good foundation for further 
development of a sound theory of pitch shifting and time scaling. 
It also pays off, when it comes to corner cases, like FM synthesis 
as extreme pitch shifting. 

8. OUTLOOK 

8.1. Band Limitation 

In our paper we have omitted how to avoid aliasing effects in pitch 
shifting caused by too high harmonics in the waveforms. In some 
way we have to band-limit the waveforms. Again, we should do 
this without actually constructing the two-dimensional cylindrical 
function. When we use interpolation that does not extend the fre- 
quency band, that is imposed by the discrete input signal, then it 
should be fine to lowpass filter the input signal before converting to 
the cylinder. The cut-off frequency must be dynamically adapted 
to the frequency modulation used on conversion from the cylinder 
to the audio signal. 

8.2. Irregular Interpolation 

We could also handle input of varying pitch. We would then need 
a function of time describing the frequency modulation which is 
used to place the signal nodes at the cylinder. This would be an 
irregular pattern and renders the whole theory of Section 3 useless. 
We had to choose a generalised 2D interpolation scheme. 
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A. AUTOMATED PROOFS WITH PVS 

The goal of proof assistants is currently not to simplify proving, 
but to get confidence that a claim is true. Actually, you will suc- 
ceed with a proof only with a profound understanding of the prob- 
lem and preferably several proof ideas, of which only one can be 
enough formalised such that the proof assistant accepts it. 



Displacement: TYPE = real 
Time: TYPE = real 
Phase: TYPE = 

Quotient (LAMBDA (pO, pi): 

integer? (pi - pO)) 
Signal: TYPE = [Time -> Displacement] 
Waveform: TYPE = [Phase -> Displacement] 
Tube: TYPE = [Time -> Waveform] 

t : VAR Time 

x: VAR Signal 

F: VAR [Signal -> Tube] 

I: VAR [Signal -> Waveform] 

IS (I) (x) (t) : Waveform = 

rotate_right (t) (I (translate_lef t (t) (x) ) ) 

time_invariant? (F) : bool = 
FORALL x, t: 

F (translate_right (t) (x) ) = 
translate2 (t, t) (F (x) ) 

interpolation_time_invariant : LEMMA 
time_invariant ? (IS (I) ) 

interpolation_slice : LEMMA 
time_invariant ? (F ) => 
(EXISTS I : F = IS (I) ) 

Figure 2 1 : Excerpt from a PVS module containing two statements: 
The first claim is that the interpolation of the form given in (14) is 
time-invariant in the sense of Definition 5. The second claim is that 
all time-invariant interpolations can he expressed in that form. In 
contrast to the PVS language, the according proof script can only 
be understood when interactively running it step by step in PVS 
and looking at how the expressions evolve. 

To give an impression of automated proving, we show the 
derivation of time-invariant interpolations from Section 3.3.1 
expressed by two lemmas in PVS [2] in Figure 21. See 

http://darcs. haskell .org/ synthesizer /src/ 
Synthesizer/Plain/ToneModulation/ for the accord- 
ing modules. The lemma, that constant interpolation preserves 
static waves is shown in Figure 22. See Section 3.3.3 for details. 
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w: VAR Waveform 

constant_tube? (y) : bool = 

FORALL tO, tl: y(tO) = y(tl) 

interpolation_constant : LEMMA 
FORALL w: constant_tube? 

(IS (LAMBDA x: x o cinv) (wo c) ) 

Figure 22: PVS lemma that claims that the constant interpolation 
preserves static waves. 
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